Handwritten Nastaleeq Script Recognition with BLSTM-CTC and ANFIS method
نویسندگان
چکیده
A recurrent neural network (RNN) has been successfully applied for recognition of cursive handwritten documents, both in English and Arabic scripts. Ability of RNNs to model context in sequence data like speech and text makes them a suitable candidate to develop OCR systems for printed Nastaleeq scripts (including Nastaleeq for which no OCR system is available to date). In this work, we have presented the results of applying RNN to printed Urdu text in Nastaleeq script. Bidirectional Long Short Term Memory (BLSTM) architecture with Connectionist Temporal Classification (CTC) output layer was employed to recognize printed Urdu text. The propose method use multidimensional BLSTM and ANFIS Method for OCR recognition. The ANFIS approach learns the rules and membership functions from data. ANFIS is an adaptive network. An adaptive network is network of nodes and directional links. These networks are learning a relationship between inputs and outputs. The Recognition error rate is 5.4 %. These results were obtained on synthetically generated UPTI dataset containing artificially degraded images to reflect some real-world scanning artifacts along with clean images. Comparison with shapematching based method is also presented.
منابع مشابه
Handwritten Urdu Character Recognition using 1-Dimensional BLSTM Classifier
The recognition of cursive script is regarded as a subtle task in optical character recognition due to its varied representation. Every cursive script has different nature and associated challenges. As Urdu is one of cursive language that is derived from Arabic script, that’s why it nearly shares the same challenges and difficulties even more harder. We can categorized Urdu and Arabic language ...
متن کاملOff-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model
In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...
متن کاملArticulatory Feature Extraction Using CTC to Build Articulatory Classifiers Without Forced Frame Alignments for Speech Recognition
Articulatory features provide robustness to speaker and environment variability by incorporating speech production knowledge. Pseudo articulatory features are a way of extracting articulatory features using articulatory classifiers trained from speech data. One of the major problems faced in building articulatory classifiers is the requirement of speech data aligned in terms of articulatory fea...
متن کاملRecurrent neural networks based Indic word-wise script identification using character-wise training
This paper presents a novel methodology of Indic handwritten script recognition using Recurrent Neural Networks and addresses the problem of script recognition in poor data scenarios, such as when only character level online data is available. It is based on the hypothesis that curves of online character data comprise sufficient information for prediction at the word level. Online character dat...
متن کاملFrom Recurrent Neural Network to Long Short Term Memory Architecture
Despite more than 30 years of handwriting recognition research, Recognizing the unconstrained sequence is still a challenge task. The difficulty of segmenting cursive script has led to the low recognition rate. Hidden Markov Models (HMMs) are considered as state-of-theart methods for performing non-constrained handwriting recognition. However, HMMs have several well-known drawbacks. One of thes...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014